On Determining the Most Effective Subset of Features for Detecting Phishing Websites

نویسندگان

  • Doaa Hassan
  • Ram B. Basnet
  • Andrew H. Sung
  • Gerhard Paa
  • Frank Reichartz
  • Siehyun Strobel
  • Aaron Blum
  • Brad Wardman
  • Thamar Solorio
  • Weibo Chu
  • Bin B. Zhu
  • Feng Xue
  • Xiaohong Guan
  • Ian Fette
  • Norman Sadeh
  • Mark Hall
  • Eibe Frank
  • Geoffrey Holmes
  • Bernhard Pfahringer
  • Peter Reutemann
  • Ian H. Witten
چکیده

Phishing websites are a form of mimicking the legitimate ones for the purpose of stealing user 's confidential information such as usernames, passwords and credit card information. Recently machine learning and data mining techniques have been a promising approach for detection of phishing websites by distinguishing between phishing and legitimate ones. The detection process in this approach is preceded by extracting various features from a website dataset to train the classifier to correctly identify phishing sites. However, not all extracted features are effective in classification or equivalent in their contribution to its performance. In this paper, we investigate the effect of feature selection on the performance of classification for predicting phishing sites. We evaluate various machine learning algorithms using a number of feature subsets selected from an extracted feature set by various feature selection techniques in order to determine the most effective subset of features that results in best classification performance. Empirical results shows that using our new proposed methodology for selecting features by removing redundant ones that equally contribute to the classification accuracy, the decision tree classifier achieves the best performance with an overall accuracy of 95. 40%, false positive rate (FPR) of 0. 046 and false negative rate (FNR) of 0. 065.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Fake Websites Using Swarm Intelligence Mechanism in Human Learning

The internet and its various services have made users to easily communicate with each other. Internet benefits including online business and e-commerce. E-commerce has boosted online sales and online auction types. Despite their many uses and benefits, the internet and their services have various challenges, such as information theft, which challenges the use of these services. Information thef...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection

The problem of Web phishing attacks has grown considerably in recent years and phishing is considered as one of the most dangerous Web crimes, which may cause tremendous and negative effects on online business. In a Web phishing attack, the phisher creates a forged or phishing website to deceive Web users in order to obtain their sensitive financial and personal information. Several conventiona...

متن کامل

A Classification Model for Detection of Chinese Phishing E-Business Websites

There has been an increasing number of fake e-Business websites created and used, which have resulted in rising financial loss for online consumers and businesses. Therefore, developing effective approaches to detecting phishing websites is essential to mitigating the possibility of being victimized by those sites and minimizing financial loss and risks. In this research, we propose a novel cla...

متن کامل

An Associative Classification Data Mining Approach for Detecting Phishing Websites

Phishing websites are fake websites that are created by dishonest people to mimic webpages of real websites. Victims of phishing attacks may expose their financial sensitive information to the attacker whom might use this information for financial and criminal activities. Various approaches have been proposed to detect phishing websites, among which, approaches that utilize data mining techniqu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015